How are you feeling right now? What do you expect to learn? Where did you hear about the course?
Also reflect on your learning experiences with the R for Health Data Science book and the Exercise Set 1: How did it work as a “crash course” on modern R tools and using RStudio? Which were your favorite topics? Which topics were most difficult? Some other comments on the book and our new approach of getting started with R Markdown etc.?
## [1] "The date today is"
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
## [1] "2023-11-26 20:22:01 EET"
The text does not continue here
The learning2014 data is based on the international survey of Approaches to Learning. Observations with “zero” value for exam points have been removed from the data. “Deep”, “stra” and “surf” are combination variables (averages) of individual measurements measuring the same “dimension”.The data frame includes 166 rows and 7 columns, and all data (except gender that is “character”) is numeric.
With the function plot() you can see that the majority of participants identified as female, and most participants were 20 - 30 years. There was a strong positive correlation between attitude and exam points (p < 0.001), and a negative correlation between deep learning (questions related to measuring the understanding of what is being studied) and attitude (p < 0.05), but also deep learning and learning strategy (making an effort to learn).
Based on the above results, I chose attitude, deep (relating to the “depth” of learning) and stra (learning strategies) as explanatory varibles for exam points, and tested these with the linear regression: + Model fit: The distribution of residuals not symmetrically distributed across 0, thus model fit is not necessarily good
t-value for points and attitude is 6.203, which indicates a possible relationship between the two variables. This is also indicated in the coefficient column (PR(>t)), where p < 0.001
t-value for deep and stra is close to zero, so likely there is no relationship between points and deep/stra (although p < 0.1 for stra, so there might be a tendency of learning strategies having a relationship with exam points)
R2 for the model is 0.2097, so 20.97% of the variance of the response variable (points) could be explained by the predictor variables (attitude, deep and stra)
Based on the above, the variables deep and stra were removed from
the next regression analysis (See below for results).
Model fit: The distribution of residuals more symmetrical than when the other two variables were included in the model => better fit
t-value for attitude is 6.124 and p < 0.001, indicating a strong relationship between exam points and attitude
R2 (multiple R squared) is 0.1906, so 19.06% of the variance in points was explained by attitude alone
Next, I tested if the model meets the assumptions of linear regression (i.e., linearity, independence, homoscedasticity, normality, no multicolinearity and no endogeneity). The “Residuals vs.fitted” plot can be used to detect non-linearity, unequal error variances, and outliers.In our data, 145, 56 and 35 seem to be outliers (but other than that the residuals are quite evenly distributed around 0). The “Q-Q Residuals” plot is used to check for normality of residuals. Here you can also see the outliers of our data. The “Residuals vs. Leverage” can be used to check for homoscedasticity and non-linearity. With our data, the spread of standardized variables increases as a function of leverage, indicating heteroscedasticity. Based on these results I would either remove outliers from the data or try data transformations (log10, square root..) to meet the assumptions of the linear regression model.
library(GGally)
## Warning: package 'GGally' was built under R version 4.3.2
## Loading required package: ggplot2
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(ggplot2)
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ stringr 1.5.0
## ✔ forcats 1.0.0 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ✔ readr 2.1.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
#Set the working directory and read the data into R
setwd("C:/Users/03114911/OneDrive - Valtion/Anne's PhD papers, results, plans etc/MBDP/Open data science/IODS-project")
learning2014_readback <- read_csv("Data/learning2014.csv", show_col_types = FALSE)
#Data structure and dimensions: these allow you to learn basic information of your data
str(learning2014_readback) # all data is numeric, except gender is "character"("M"/"F")
## spc_tbl_ [166 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ gender : chr [1:166] "F" "M" "F" "M" ...
## $ age : num [1:166] 53 55 49 53 49 38 50 37 37 42 ...
## $ attitude: num [1:166] 3.7 3.1 2.5 3.5 3.7 3.8 3.5 2.9 3.8 2.1 ...
## $ deep : num [1:166] 3.58 2.92 3.5 3.5 3.67 ...
## $ stra : num [1:166] 3.38 2.75 3.62 3.12 3.62 ...
## $ surf : num [1:166] 2.58 3.17 2.25 2.25 2.83 ...
## $ points : num [1:166] 25 12 24 10 22 21 21 31 24 26 ...
## - attr(*, "spec")=
## .. cols(
## .. gender = col_character(),
## .. age = col_double(),
## .. attitude = col_double(),
## .. deep = col_double(),
## .. stra = col_double(),
## .. surf = col_double(),
## .. points = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
dim(learning2014_readback) # the data has 166 rows and 7 columns
## [1] 166 7
#Plot the relationships between variables in learning2014_readback to see how the variables are related and if there are any significant relationships. Alpha set to 0.1, only showing significance p > 0.1.
plot <- ggpairs(learning2014_readback, mapping = aes(col = gender, alpha = 0.1))
plot
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Multiple regression
model1 <- lm(points ~ attitude + deep + stra, data = learning2014_readback)
summary(model1)
##
## Call:
## lm(formula = points ~ attitude + deep + stra, data = learning2014_readback)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.5239 -3.4276 0.5474 3.8220 11.5112
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.3915 3.4077 3.343 0.00103 **
## attitude 3.5254 0.5683 6.203 4.44e-09 ***
## deep -0.7492 0.7507 -0.998 0.31974
## stra 0.9621 0.5367 1.793 0.07489 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.289 on 162 degrees of freedom
## Multiple R-squared: 0.2097, Adjusted R-squared: 0.195
## F-statistic: 14.33 on 3 and 162 DF, p-value: 2.521e-08
model2 <- lm(points ~ attitude, data = learning2014_readback)
summary(model2)
##
## Call:
## lm(formula = points ~ attitude, data = learning2014_readback)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.9763 -3.2119 0.4339 4.1534 10.6645
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.6372 1.8303 6.358 1.95e-09 ***
## attitude 3.5255 0.5674 6.214 4.12e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.32 on 164 degrees of freedom
## Multiple R-squared: 0.1906, Adjusted R-squared: 0.1856
## F-statistic: 38.61 on 1 and 164 DF, p-value: 4.119e-09
# Testing whether the model meets the assumptions of a linear regression
par(mfrow = c(2,2))
plot(model2, which = c(1,2,5))
date()
## [1] "Sun Nov 26 20:22:09 2023"
Here we go again…
This data includes information on student alcohol consumption, personal identifiers (age, sex, etc.), information about the family (education of parents, family size, etc.) and variables measuring the student’s success in school.
## [1] "school" "sex" "age" "address" "famsize"
## [6] "Pstatus" "Medu" "Fedu" "Mjob" "Fjob"
## [11] "reason" "guardian" "traveltime" "studytime" "schoolsup"
## [16] "famsup" "activities" "nursery" "higher" "internet"
## [21] "romantic" "famrel" "freetime" "goout" "Dalc"
## [26] "Walc" "health" "failures" "paid" "absences"
## [31] "G1" "G2" "G3" "alc_use" "high_use"
I decided to study the relationships between low/high alcohol consumption and parental education (“Medu” for mother’s education and “Fedu” for father’s education), student’s current health status (“health”) and quality of family relationships (“famrel”).
I hypothesize that
The mean age of the students was 16.6 years. The mean score for alcohol consumption (on a scale of 1 to 5, where 1 = very low consumption and 5 = very high consumption) was 1.9, indicating a moderate level of alcohol consumption across the data set. The mean scores for the education levels of the students mother and father were 2.8 and 2.6, indicating a slightly higher level of education for the mothers compared to fathers. Over all, the students health levels were really good (mean score 3.6), and in addition, the quality of the family relationships was really good (mean score of 3.9). The count summaries support the above findings.
## [1] 16.6
## [1] 1.9
## [1] 2.8
## [1] 2.6
## [1] 3.6
## [1] 3.9
## # A tibble: 7 × 2
## age count
## <dbl> <int>
## 1 15 81
## 2 16 102
## 3 17 97
## 4 18 77
## 5 19 11
## 6 20 1
## 7 22 1
## # A tibble: 9 × 2
## alc_use count
## <dbl> <int>
## 1 1 140
## 2 1.5 63
## 3 2 56
## 4 2.5 41
## 5 3 32
## 6 3.5 17
## 7 4 9
## 8 4.5 3
## 9 5 9
## # A tibble: 5 × 2
## Medu count
## <dbl> <int>
## 1 0 3
## 2 1 49
## 3 2 96
## 4 3 93
## 5 4 129
## # A tibble: 5 × 2
## Fedu count
## <dbl> <int>
## 1 0 2
## 2 1 73
## 3 2 105
## 4 3 97
## 5 4 93
## # A tibble: 5 × 2
## health count
## <dbl> <int>
## 1 1 46
## 2 2 42
## 3 3 80
## 4 4 62
## 5 5 140
## # A tibble: 5 × 2
## famrel count
## <dbl> <int>
## 1 1 8
## 2 2 18
## 3 3 64
## 4 4 180
## 5 5 100
Below, the charts show that in general, alcohol use seemed to be more frequent where either parent had a higher education. This is opposite to my first hypothesis.
In addition, in contrast to my second and third hypotheses, either poor health score or poor family relationships did not seem to be associated with more frequent alcohol use.
Now, let’s see if there are statistically significant differences associated with the variables that I’ve chosen and high alcohol use. Let’s set the “family = binomial” to let R know that the dependent variable high_use is binary (TRUE/FALSE).
Summary of the model shows a significant (p < 0.05 ) relationship between the quality of family relationships and high alcohol use. In addition, there is a trend of the health score being associated with high alcohol use (p < 0.1).
##
## Call:
## glm(formula = high_use ~ Medu + Fedu + health + famrel, family = "binomial",
## data = alc)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.18242 0.63354 -0.288 0.7734
## Medu -0.01537 0.13745 -0.112 0.9110
## Fedu 0.02092 0.13709 0.153 0.8787
## health 0.14777 0.08482 1.742 0.0815 .
## famrel -0.31087 0.12454 -2.496 0.0126 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 452.04 on 369 degrees of freedom
## Residual deviance: 443.49 on 365 degrees of freedom
## AIC: 453.49
##
## Number of Fisher Scoring iterations: 4
## (Intercept) Medu Fedu health famrel
## -0.18242217 -0.01536970 0.02092328 0.14777237 -0.31086747
Since the education of either parent does not explain high alcohol use, let’s remove them from the model see and see how that looks. Let’s also look at the coefficients/odd ratios and provide confidence intervals for them.
The summary shows a small increase/improvement in the p - value for “health”. From the coefficients and confidence interval output we can see that the odds of variable “health” is ~1.16, meaning that exposure to “high_use” was associated with higher odds of outcome for health. For family relationships (famrel), the odd ratio was 0.7 meaning that the exposure to “high_use” was associated with lower odds of outcome for the quality of family relationships. Around 30% of the variance in “health odd ratio” can be explained with 97,5% confidence interval (which is OK?), but for famrel the value is below zero (which is not good, I assume :D)
##
## Call:
## glm(formula = high_use ~ health + famrel, family = "binomial",
## data = alc)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.17461 0.54626 -0.320 0.7492
## health 0.14850 0.08465 1.754 0.0794 .
## famrel -0.31083 0.12454 -2.496 0.0126 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 452.04 on 369 degrees of freedom
## Residual deviance: 443.51 on 367 degrees of freedom
## AIC: 449.51
##
## Number of Fisher Scoring iterations: 4
## odd_ratios 2.5 % 97.5 %
## (Intercept) 0.8397865 -1.25837614 0.89223235
## health 1.1600907 -0.01507081 0.31752786
## famrel 0.7328380 -0.55725045 -0.06718336
Next, let’s explore the predictive power of our model glm2. The prediction got 256 (out of possible 259) FALSE observations “right”, so here the prediction works quite efficiently. However, when it comes to TRUE observations in the data, the prediction was not able to predict any of them (so the “result” is 0/111, which to me, sounds bad). Since I’m not a data scientist and this is my first time performing such analysis, I’m not quite sure how to interpret this. I’m sure (?) something in our model is not as is should be (I know at least that the variables health and famrel are not anywhere near normally distributed, so maybe that is one thing..?). The average number of incorrect predictions with probability 0 was 0.3, and with probability 1 it was 0.7.This means that when we have a binary (TRUE/FALSE and “positive”/“negative”) problem, any prediction that is set with the probability of 0 is the prediction for “FALSE” or “negative” => in our case, the average number of incorrect predictions for “FALSE” is 0.3, where as for “TRUE” it is 0.7. This was also reflected in the cross tabulation where the prediction failed with all TRUE predictions.
## prediction
## high_use FALSE TRUE
## FALSE 256 3
## TRUE 111 0
## [1] 0.3
## [1] 0.7
Regarding the three hypothesis set in the beginning, it looked like student health and the quality of family relationships were associated with alcohol use, so my last two hypothese were confirmed. However, I found no support for the first hypothesis, and it looks like the education level of either parent is not associated with alcohol consumption in students.Based on the odd ratios and predictions my glm model did not really work well, but the time allocated for these exercises did not allow me to further explore what was “wrong” with it.
As final words I would like to declare that I put a lot of time and effort in these exercises, and although I might have not interpreted the results “correctly” and made a few mistakes, I really tried my best and gave it a 100 % effort. :) I hope you (whoever grades this) appreciate the effort I made (being new to all these things shows…) :)
The Boston data frame contains information related to the housing values in the suburbs of Boston. The data consists of 506 rows and 14 columns.
## Rows: 506
## Columns: 14
## $ crim <dbl> 0.00632, 0.02731, 0.02729, 0.03237, 0.06905, 0.02985, 0.08829,…
## $ zn <dbl> 18.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 1…
## $ indus <dbl> 2.31, 7.07, 7.07, 2.18, 2.18, 2.18, 7.87, 7.87, 7.87, 7.87, 7.…
## $ chas <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ nox <dbl> 0.538, 0.469, 0.469, 0.458, 0.458, 0.458, 0.524, 0.524, 0.524,…
## $ rm <dbl> 6.575, 6.421, 7.185, 6.998, 7.147, 6.430, 6.012, 6.172, 5.631,…
## $ age <dbl> 65.2, 78.9, 61.1, 45.8, 54.2, 58.7, 66.6, 96.1, 100.0, 85.9, 9…
## $ dis <dbl> 4.0900, 4.9671, 4.9671, 6.0622, 6.0622, 6.0622, 5.5605, 5.9505…
## $ rad <int> 1, 2, 2, 3, 3, 3, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ tax <dbl> 296, 242, 242, 222, 222, 222, 311, 311, 311, 311, 311, 311, 31…
## $ ptratio <dbl> 15.3, 17.8, 17.8, 18.7, 18.7, 18.7, 15.2, 15.2, 15.2, 15.2, 15…
## $ black <dbl> 396.90, 396.90, 392.83, 394.63, 396.90, 394.12, 395.60, 396.90…
## $ lstat <dbl> 4.98, 9.14, 4.03, 2.94, 5.33, 5.21, 12.43, 19.15, 29.93, 17.10…
## $ medv <dbl> 24.0, 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15…
## [1] "crim" "zn" "indus" "chas" "nox" "rm" "age"
## [8] "dis" "rad" "tax" "ptratio" "black" "lstat" "medv"
From the data summary we can see that, for example, the average crime rate per capita (“crim”) is 3.6, and that the median value of owner-occupied homes (“medv”) is around 22.5k$. As the summary and the boxplots also show, the majority of owner-occupied units were built before the 1940s (“age”).
## Warning: package 'corrplot' was built under R version 4.3.2
## crim zn indus chas
## Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
## 1st Qu.: 0.08205 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
## Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
## Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
## 3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
## Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
## nox rm age dis
## Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
## 1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
## Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
## Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
## 3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
## Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
## rad tax ptratio black
## Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 0.32
## 1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38
## Median : 5.000 Median :330.0 Median :19.05 Median :391.44
## Mean : 9.549 Mean :408.2 Mean :18.46 Mean :356.67
## 3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23
## Max. :24.000 Max. :711.0 Max. :22.00 Max. :396.90
## lstat medv
## Min. : 1.73 Min. : 5.00
## 1st Qu.: 6.95 1st Qu.:17.02
## Median :11.36 Median :21.20
## Mean :12.65 Mean :22.53
## 3rd Qu.:16.95 3rd Qu.:25.00
## Max. :37.97 Max. :50.00
The correlation plot (alpha = 0.001) shows significant negative correlations between the distance to employment centres (“dis”) and the proportion of non-retail business (“indus”), nitrogen oxide concentrations (“nox”), and the proportion of units built before the 1940s (“age”). There is a strong positive correlation (p < 0.001) between the property tax-rate (“tax”, full-value property-tax rate per $10,000) and accessability to radial highways (“rad”).
Let’s scale the data and print the summary of the scaled data. => All means are set to 0.00.
Next, let’s create a categorial variable from the scaled crime rate in Boston dataset => let’s use quantiles as break points and drop the old crime rate from the data set.
## crim zn indus chas
## Min. :-0.419367 Min. :-0.48724 Min. :-1.5563 Min. :-0.2723
## 1st Qu.:-0.410563 1st Qu.:-0.48724 1st Qu.:-0.8668 1st Qu.:-0.2723
## Median :-0.390280 Median :-0.48724 Median :-0.2109 Median :-0.2723
## Mean : 0.000000 Mean : 0.00000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.007389 3rd Qu.: 0.04872 3rd Qu.: 1.0150 3rd Qu.:-0.2723
## Max. : 9.924110 Max. : 3.80047 Max. : 2.4202 Max. : 3.6648
## nox rm age dis
## Min. :-1.4644 Min. :-3.8764 Min. :-2.3331 Min. :-1.2658
## 1st Qu.:-0.9121 1st Qu.:-0.5681 1st Qu.:-0.8366 1st Qu.:-0.8049
## Median :-0.1441 Median :-0.1084 Median : 0.3171 Median :-0.2790
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.5981 3rd Qu.: 0.4823 3rd Qu.: 0.9059 3rd Qu.: 0.6617
## Max. : 2.7296 Max. : 3.5515 Max. : 1.1164 Max. : 3.9566
## rad tax ptratio black
## Min. :-0.9819 Min. :-1.3127 Min. :-2.7047 Min. :-3.9033
## 1st Qu.:-0.6373 1st Qu.:-0.7668 1st Qu.:-0.4876 1st Qu.: 0.2049
## Median :-0.5225 Median :-0.4642 Median : 0.2746 Median : 0.3808
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 1.6596 3rd Qu.: 1.5294 3rd Qu.: 0.8058 3rd Qu.: 0.4332
## Max. : 1.6596 Max. : 1.7964 Max. : 1.6372 Max. : 0.4406
## lstat medv
## Min. :-1.5296 Min. :-1.9063
## 1st Qu.:-0.7986 1st Qu.:-0.5989
## Median :-0.1811 Median :-0.1449
## Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.6024 3rd Qu.: 0.2683
## Max. : 3.5453 Max. : 2.9865
## [1] FALSE
## [1] "numeric"
## [1] "zn" "indus" "chas" "nox" "rm" "age" "dis"
## [8] "rad" "tax" "ptratio" "black" "lstat" "medv" "crime"
Then, let’s divide the data to “train” and “test” sets, and include 80 % of the data to “train” (and the rest to “test”). Finally, let’s fit an LDA biplot (where crime rate is the target variable and all other variables are predictor variables.) Let’s save the correct classes from “crime” data and predict new classes with the LDA.
From the prediction we can see that the prediction was correct for 15/23 in low, 15/29 in med_low, 16/19 in med_high and 31/31 in high. It looks like the model works better when predicting higher values (med_high or high, accuracy ~85-100%), where as for lower values the predictions were less accurate (med_low and low, accurary ~51-65%).
## integer(0)
## predicted
## correct low med_low med_high high
## low 12 13 1 0
## med_low 3 14 6 0
## med_high 0 10 18 1
## high 0 0 0 24
Now, let’s reload the Boston dataset and standardize it. Let’s calculate the distances between the observations. Then run the K-means algorithm and determine the optimal number of clusters before re-running the algorithm based on the optimal clustering number.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1343 3.4625 4.8241 4.9111 6.1863 14.3970
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2662 8.4832 12.6090 13.5488 17.7568 48.8618
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Let’s try the super-bonus..
… OK, something weird is happening here with the second plot but I can’t figure out what it is. Let’s just forget about the super-bonus. :’(
## [1] 404 13
## [1] 13 3
## Warning: package 'plotly' was built under R version 4.3.2